AITopics | document classification

Collaborating Authors

document classification

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

An Efficient Orlicz-Sobolev Approach for Transporting Unbalanced Measures on a Graph

Neural Information Processing SystemsJun-18-2026, 13:23:19 GMT

We investigate optimal transport (OT) for measures on graph metric spaces with different total masses. To mitigate the limitations of traditional Lp geometry, Orlicz-Wasserstein (OW) and generalized Sobolev transport (GST) employ Orlicz geometric structure, leveraging convex functions to capture nuanced geometric relationships and remarkably contribute to advance certain machine learning approaches. However, both OW and GST are restricted to measures with equal total mass, limiting their applicability to real-world scenarios where mass variation is common, and input measures may have noisy supports, or outliers. To address unbalanced measures, OW can either incorporate mass constraints or marginal discrepancy penalization, but this leads to a more complex two-level optimization problem. Additionally, GST provides a scalable yet rigid framework, which poses significant challenges to extend GST to accommodate nonnegative measures.

artificial intelligence, machine learning, orlicz-ept, (19 more...)

Neural Information Processing Systems

Genre:

Research Report > Experimental Study (1.00)
Overview (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.66)

Add feedback

Tree-Sliced Variants of Wasserstein Distances

Tam Le, Makoto Yamada, Kenji Fukumizu, Marco Cuturi

Neural Information Processing SystemsFeb-11-2026, 19:17:02 GMT

Neural Information Processing Systems http://nips.cc/

dataset, kernel, tree metric, (10 more...)

Neural Information Processing Systems

Country:

North America > Canada (0.04)
Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.04)
Asia > Afghanistan > Parwan Province > Charikar (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

bda5c35eded86adaf0231748e3ce071c-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-11-2026, 16:23:08 GMT

minibatch, semantic complement and meta-learning, singapore management university singapore 178902, (9 more...)

Neural Information Processing Systems

Country: Asia > Singapore (0.08)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Evaluating Out-of-Distribution Performance on Document Image Classifiers

Neural Information Processing SystemsDec-24-2025, 04:16:55 GMT

The ability of a document classifier to handle inputs that are drawn from a distribution different from the training distribution is crucial for robust deployment and generalizability. The RVL-CDIP corpus is the de facto standard benchmark for document classification, yet to our knowledge all studies that use this corpus do not include evaluation on out-of-distribution documents. In this paper, we curate and release a new out-of-distribution benchmark for evaluating out-of-distribution performance for document classifiers. Our new out-of-distribution benchmark consists of two types of documents: those that are not part of any of the 16 in-domain RVL-CDIP categories (RVL-CDIP-O), and those that are one of the 16 in-domain categories yet are drawn from a distribution different from that of the original RVL-CDIP dataset (RVL-CDIP-N). While prior work on document classification for in-domain RVL-CDIP documents reports high accuracy scores, we find that these models exhibit accuracy drops of between roughly 15-30% on our new out-of-domain RVL-CDIP-N benchmark, and further struggle to distinguish between in-domain RVL-CDIP-N and out-of-domain RVL-CDIP-O inputs. Our new benchmark provides researchers with a valuable new resource for analyzing out-of-distribution performance on document classifiers.

document image classifier, name change, out-of-distribution performance, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.62)

Add feedback

ThaiOCRBench: A Task-Diverse Benchmark for Vision-Language Understanding in Thai

Nonesung, Surapon, Jaknamon, Teetouch, Chaiophat, Sirinya, Nitarach, Natapong, Wittayasakpan, Chanakan, Sirichotedumrong, Warit, Na-Thalang, Adisai, Pipatanakul, Kunat

arXiv.org Artificial IntelligenceDec-5-2025

We present ThaiOCRBench, the first comprehensive benchmark for evaluating vision-language models (VLMs) on Thai text-rich visual understanding tasks. Despite recent progress in multimodal modeling, existing benchmarks predominantly focus on high-resource languages, leaving Thai underrepresented, especially in tasks requiring document structure understanding. ThaiOCRBench addresses this gap by offering a diverse, human-annotated dataset comprising 2,808 samples across 13 task categories. We evaluate a wide range of state-of-the-art VLMs in a zero-shot setting, spanning both proprietary and open-source systems. Results show a significant performance gap, with proprietary models (e.g., Gemini 2.5 Pro) outperforming open-source counterparts. Notably, fine-grained text recognition and handwritten content extraction exhibit the steepest performance drops among open-source models. Through detailed error analysis, we identify key challenges such as language bias, structural mismatch, and hallucinated content. ThaiOCRBench provides a standardized framework for assessing VLMs in low-resource, script-complex settings, and provides actionable insights for improving Thai-language document understanding.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2511.04479

Country: North America > United States (0.93)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.90)

Add feedback

SemImage: Semantic Image Representation for Text, a Novel Framework for Embedding Disentangled Linguistic Features

Zare, Mohammad

arXiv.org Artificial IntelligenceDec-2-2025

We propose SemImage, a novel method for representing a text document as a two-dimensional semantic image to be processed by convolutional neural networks (CNNs). In a SemImage, each word is represented as a pixel in a 2D image: rows correspond to sentences and an additional boundary row is inserted between sentences to mark semantic transitions. Each pixel is not a typical RGB value but a vector in a disentangled HSV color space, encoding different linguistic features: the Hue with two components H_cos and H_sin to account for circularity encodes the topic, Saturation encodes the sentiment, and Value encodes intensity or certainty. We enforce this disentanglement via a multi-task learning framework: a ColorMapper network maps each word embedding to the HSV space, and auxiliary supervision is applied to the Hue and Saturation channels to predict topic and sentiment labels, alongside the main task objective. The insertion of dynamically computed boundary rows between sentences yields sharp visual boundaries in the image when consecutive sentences are semantically dissimilar, effectively making paragraph breaks salient. We integrate SemImage with standard 2D CNNs (e.g., ResNet) for document classification. Experiments on multi-label datasets (with both topic and sentiment annotations) and single-label benchmarks demonstrate that SemImage can achieve competitive or better accuracy than strong text classification baselines (including BERT and hierarchical attention networks) while offering enhanced interpretability. An ablation study confirms the importance of the multi-channel HSV representation and the dynamic boundary rows. Finally, we present visualizations of SemImage that qualitatively reveal clear patterns corresponding to topic shifts and sentiment changes in the generated image, suggesting that our representation makes these linguistic features visible to both humans and machines.

machine learning, natural language, text classification, (21 more...)

arXiv.org Artificial Intelligence

2512.00088

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Classification (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Generalized Sobolev IPM for Graph-Based Measures

Le, Tam, Nguyen, Truyen, Hino, Hideitsu, Fukumizu, Kenji

arXiv.org Machine LearningOct-30-2025

We study the Sobolev IPM problem for measures supported on a graph metric space, where critic function is constrained to lie within the unit ball defined by Sobolev norm. While Le et al. (2025) achieved scalable computation by relating Sobolev norm to weighted $L^p$-norm, the resulting framework remains intrinsically bound to $L^p$ geometric structure, limiting its ability to incorporate alternative structural priors beyond the $L^p$ geometry paradigm. To overcome this limitation, we propose to generalize Sobolev IPM through the lens of \emph{Orlicz geometric structure}, which employs convex functions to capture nuanced geometric relationships, building upon recent advances in optimal transport theory -- particularly Orlicz-Wasserstein (OW) and generalized Sobolev transport -- that have proven instrumental in advancing machine learning methodologies. This generalization encompasses classical Sobolev IPM as a special case while accommodating diverse geometric priors beyond traditional $L^p$ structure. It however brings up significant computational hurdles that compound those already inherent in Sobolev IPM. To address these challenges, we establish a novel theoretical connection between Orlicz-Sobolev norm and Musielak norm which facilitates a novel regularization for the generalized Sobolev IPM (GSI). By further exploiting the underlying graph structure, we show that GSI with Musielak regularization (GSI-M) reduces to a simple \emph{univariate optimization} problem, achieving remarkably computational efficiency. Empirically, GSI-M is several-order faster than the popular OW in computation, and demonstrates its practical advantages in comparing probability measures on a given graph for document classification and several tasks in topological data analysis.

machine learning, natural language, sobolev ipm, (18 more...)

arXiv.org Machine Learning

2510.25591

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)

Add feedback

Comparison of Machine Learning Models to Classify Documents on Digital Development

Ranaweera, Uvini, Mawitagama, Bawun, Liyanage, Sanduni, Keshan, Sandupa, de Silva, Tiloka, Hewawalpita, Supun

arXiv.org Artificial IntelligenceOct-3-2025

Automated document classification is a trending topic in Natural Language Processing (NLP) due to the extensive growth in digital databases. However, a model that fits well for a specific classification task might perform weakly for another dataset due to differences in the context. Thus, training and evaluating several models is necessary to optimise the results. This study employs a publicly available document database on worldwide digital development interventions categorised under twelve areas. Since digital interventions are still emerging, utilising NLP in the field is relatively new. Given the exponential growth of digital interventions, this research has a vast scope for improving how digital-development-oriented organisations report their work. The paper examines the classification performance of Machine Learning (ML) algorithms, including Decision Trees, k-Nearest Neighbors, Support Vector Machine, AdaBoost, Stochastic Gradient Descent, Naive Bayes, and Logistic Regression. Accuracy, precision, recall and F1-score are utilised to evaluate the performance of these models, while oversampling is used to address the class-imbalanced nature of the dataset. Deviating from the traditional approach of fitting a single model for multiclass classification, this paper investigates the One vs Rest approach to build a combined model that optimises the performance. The study concludes that the amount of data is not the sole factor affecting the performance; features like similarity within classes and dissimilarity among classes are also crucial.

artificial intelligence, classification, machine learning, (19 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/978-981-99-7969-1_5

2510.0072

Country: North America > United States (1.00)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology > Security & Privacy (0.96)
Government > Regional Government > North America Government > United States Government (0.70)

Add feedback

Export Reviews, Discussions, Author Feedback and Meta-Reviews

Neural Information Processing SystemsOct-2-2025, 19:26:18 GMT

First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. This paper proposes to learn bilingual word vector representations through an autoencoder. The novelty of this approach is to not rely on word-level alignments. It only requires aligned sentences. An autoencoder model is used to reconstruct the bag-of-words representation of aligned sentences, within and between languages.

experiment, hermann and blunsom, representation, (11 more...)

Neural Information Processing Systems

Country: North America > Canada > Quebec > Montreal (0.04)

Genre: